Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202111] [Mellanox] Fix issue: error message from system-health daemon is observed during system starting #10843

Merged
merged 1 commit into from
May 21, 2022

Conversation

stephenxs
Copy link
Collaborator

Why I did it

Error message: "ERR healthd: Failed to read from file /var/run/hw-management/led/led_status_capability" is observed during system starting
The system-health daemon will wait for 5 minutes before it starts to run.
During this time, the only thing it does is to set the LED even before it starts.
However, the corresponding sysfs has not been ready at the time it is being read, which causes the error message.

Signed-off-by: Stephen Sun stephens@nvidia.com

How I did it

Defer system-health daemon until hw-management service starts

How to verify it

Run regression test

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

… system starting

Error message: "ERR healthd: Failed to read from file /var/run/hw-management/led/led_status_capability" is observed during system starting
The system-health daemon will wait for 5 minutes before it starts to run.
During this time, the only thing it does is to set the LED even before it starting.
However, the corresponding sysfs has not been ready at the time it is being read, which cause the error message.
Fix:
Defer system-health daemon until hw-management service starts

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Collaborator Author

The failure is not caused by the PR. It should be a generic failure.

2022-05-16T08:38:12.4161659Z ======================================================================
2022-05-16T08:38:12.4162165Z FAIL: test_back_end_asic_acl (tests.test_multinpu_cfggen.TestMultiNpuCfgGen)
2022-05-16T08:38:12.4162874Z ----------------------------------------------------------------------
2022-05-16T08:38:12.4163294Z Traceback (most recent call last):
2022-05-16T08:38:12.4164019Z   File "/sonic/src/sonic-config-engine/tests/test_multinpu_cfggen.py", line 364, in test_back_end_asic_acl
2022-05-16T08:38:12.4164534Z     output = json.loads(self.run_script(argument))
2022-05-16T08:38:12.4165238Z   File "/sonic/src/sonic-config-engine/tests/test_multinpu_cfggen.py", line 38, in run_script
2022-05-16T08:38:12.4165726Z     self.assertTrue(self.yang.validate(argument))
2022-05-16T08:38:12.4166106Z AssertionError: False is not true
2022-05-16T08:38:12.4166266Z 
2022-05-16T08:38:12.4166600Z ======================================================================
2022-05-16T08:38:12.4167108Z FAIL: test_back_end_asic_acl1 (tests.test_multinpu_cfggen.TestMultiNpuCfgGen)
2022-05-16T08:38:12.4167810Z ----------------------------------------------------------------------
2022-05-16T08:38:12.4168225Z Traceback (most recent call last):
2022-05-16T08:38:12.4168945Z   File "/sonic/src/sonic-config-engine/tests/test_multinpu_cfggen.py", line 369, in test_back_end_asic_acl1
2022-05-16T08:38:12.4169686Z     output = json.loads(self.run_script(argument))
2022-05-16T08:38:12.4170394Z   File "/sonic/src/sonic-config-engine/tests/test_multinpu_cfggen.py", line 38, in run_script
2022-05-16T08:38:12.4170900Z     self.assertTrue(self.yang.validate(argument))
2022-05-16T08:38:12.4171271Z AssertionError: False is not true
2022-05-16T08:38:12.4171429Z 

@liat-grozovik liat-grozovik changed the title Fix issue: error message from system-health daemon is observed during system starting [Mellanox] Fix issue: error message from system-health daemon is observed during system starting May 16, 2022
@stephenxs
Copy link
Collaborator Author

The build failure is caused by PR #9700 being cherry-picked to 202111

@ganglyu
Copy link
Contributor

ganglyu commented May 18, 2022

The build failure is caused by PR #9700 being cherry-picked to 202111

Please update and try again.

@liat-grozovik liat-grozovik changed the title [Mellanox] Fix issue: error message from system-health daemon is observed during system starting [202111] [Mellanox] Fix issue: error message from system-health daemon is observed during system starting May 18, 2022
@liat-grozovik
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

/azpw run azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

/azpw run azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik liat-grozovik merged commit 2aa1c2f into sonic-net:202111 May 21, 2022
@dprital dprital removed the Request for 202111 Branch For PRs being requested for 202111 branch label May 22, 2022
@stephenxs stephenxs deleted the fix-system-health branch November 3, 2022 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants